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Abstract 

Over the last decade, kernel methods for nonlinear processing have successfully been used in the 
machine learning community. The primary mathematical tool employed in these methods is the notion 
of the Reproducing Kernel Hilbert Space. However, so far, the emphasis has been on batch techniques. It 
is only recently, that online techniques have been considered in the context of adaptive signal processing 
tasks. Moreover, these efforts have only been focussed on real valued data sequences. To the best of our 
knowledge, no adaptive kernel-based strategy has been developed, so far, for complex valued signals. 
Furthermore, although the real reproducing kernels are used in an increasing number of machine learning 
problems, complex kernels have not, yet, been used, in spite of their potential interest in applications 
that deal with complex signals, with Communications being a typical example. In this paper, we present 
a general framework to attack the problem of adaptive filtering of complex signals, using either real 
reproducing kernels, taking advantage of a technique called complexification of real RKHSs, or complex 
reproducing kernels, highlighting the use of the complex gaussian kernel. 

In order to derive gradients of operators that need to be defined on the associated complex RKHSs, 
we employ the powerful tool of Wirtinger's Calculus, which has recently attracted attention in the signal 
processing community. Wirtinger's calculus simplifies computations and offers an elegant tool for treating 
complex signals. To this end, in this paper, the notion of Wirtinger's calculus is extended, for the first 
time, to include complex RKHSs and use it to derive several realizations of the Complex Kernel Least- 
Mean-Square (CKLMS) algorithm. Experiments verify that the CKLMS offers significant performance 
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improvements over several linear and nonlinear algorithms, when dealing with nonlinearities. 
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Extension of Wirtinger's Calculus to 
Reproducing Kernel Hilbert Spaces and the 
Complex Kernel LMS 

I. Introduction 

Processing in Reproducing Kernel Hilbert Spaces (RKHSs), in the context of online learning, is 
gaining in popularity within the Machine Learning and Signal Processing communities (l]-|6l. The main 
advantage of mobilizing the tool of RKHSs is that the original nonlinear task is "transformed" into a linear 
one, which can be solved by employing an easier "algebra". Moreover, different types of nonlinearities 
can be treated in a unifying way, with no effect on the mathematical derivation of the algorithms, except at 
the final implementation stage. The main concepts of this procedure can be summarized in the following 
two steps: 1) Map the finite dimensionality input data from the input space F (usually F C M. u ) into 
a higher dimensionality (possibly infinite) RKHS % and 2) Perform a linear processing (e.g., adaptive 
filtering) on the mapped data in %. The procedure is equivalent with a non-linear processing (non-linear 
filtering) in F. 

An alternative way of describing this process is through the popular kernel trick Q, (8]: Given an 
algorithm, which can be formulated in terms of dot products, one can construct an alternative algorithm 
by replacing each one of the dot products with a positive definite kernel k. The specific choice of 
kernel implicitly defines a RKHS with an appropriate inner product. Furthermore, the choice of kernel 
also defines the type of nonlinearity that underlies the model to be used. The main representatives of 
this class of algorithms are the celebrated support vector machines (SVMs), which have dominated the 
research in machine learning over the last decade [9]. Besides SVMs and the more recent applications in 
adaptive filtering, there is a plethora of other scientific domains that have gained from adopting kernel 
methods (e.g., image processing and denoising ifTOl , ifTTTl . principal component analysis lfT2l . clustering 
HI, e.t.c). 

In classification tasks (which have been the dominant applications of kernel methods) the use of 
complex reproducing kernels is meaningless, since no arrangement can be derived in complex domains 
and the necessary separating hypersurfaces cannot be defined. Consequently, all known kernel based 
applications, as they emerged from the specific background, use real-valued kernels and they are able to 
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deal with real valued data sequences only. To our knowledge, no kernel-based strategy has been developed, 
so far, that is able to effectively deal with complex valued signals. 

In this paper, we present a general framework to address the problem of adaptive filtering of complex 
signals, using either real reproducing kernels, taking advantage of a technique called complexification 
of real RKHSs, or complex reproducing kernels, highlighting mostly the use of the complex gaussian 
kernel. Although the real gaussian RBF kernel has become quite popular and it has been used in many 
applications, the complex gaussian RBF kernel, while known to the mathematicians (especially those 
working on Reproducing Kernel Hilbert Spaces or Functional Analysis), it has rather remained in obscurity 
in the Machine Learning and Signal Processing communities. Even though the presented framework has 
a broad range and may be applied to generalize a wide variety of kernel methods to the complex domain, 
this work focuses on the recently developed Kernel LMS (KLMS) |'U, Ifl4ll . 

To compute the gradients of cost functions that are defined on the complex RKHSs, the principles 
of Wirtinger's calculus are employed. Wirtinger's calculus [15] has recently attracted attention in the 
signal processing community, mainly in the context of complex adaptive filtering 1 161- 11231 , as a means 
of computing, in an elegant way, gradients of real valued cost functions defined on complex domains 
(C u ). To this end, the main ideas and theorems of Wirtinger's calculus are generalized to general complex 
Hilbert spaces for the first time. 

To summarize, the main contributions of this paper are: a) the development of a wide framework that 
allows real-valued kernel algorithms to be extended to treat complex data effectively, taking advantage 
of a technique called complexification of real RKHSs, b) to elevate from obscurity the complex Gaussian 
kernel as a tool for kernel based adaptive processing of complex signals, c) the extension of Wirtinger's 
Calculus in complex RKHSs as a means for an elegant and efficient computation of the gradients, 
which are involved in the derivation of adaptive learning algorithms, and d) the development of several 
realizations of the Complex Kernel LMS (CKLMS) algorithm, by exploiting the extension of Wirtinger's 
calculus and the generated complex RKHSs. 

The paper is organized as follows. We start with an introduction to RKHSs in Section|IIl which includes 
real and complex kernels, before we briefly review the KLMS algorithm in Section [TlTJ In Section |IVl we 
describe the complexification procedure of a real RKHS, that provides a framework to develop complex 
kernel methods, based on popular real valued reproducing kernels (e.g., gaussian, polynomial, e.t.c). A 
brief introduction on Wirtinger's Calculus in finite dimensional spaces can be found in Section [V] The 
main notions of the extended Wirtinger's Calculus on general Hilbert spaces are summarized in Section 
|VT]and the CKLMS is developed thereafter in Section [VlTJ Finally, experimental results and conclusions 
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are provided in Sections IVIIII and |TX] Throughout the paper, we will denote the set of all integers, real 
and complex numbers by N, R and C respectively. Vector or matrix valued quantities appear in boldfaced 
symbols. 

II. Reproducing Kernel Hilbert Spaces 

In this section, we briefly describe the theory of Reproducing Kernel Hilbert Spaces. Since we are 
interested on both real and complex kernels, we recall the basic facts on RKHS associated with a general 
field F, which can be either M or C. However, we highlight the basic differences between the two cases. 
The material presented here may be found with more details in [24] and [25]. 

A. Basic Definitions 

Given a function k : X x X — > F and x\, . . . ,xn G X, the matrixQ K = (Kij) N with elements 
Ki j = K,(xi,Xj), for i,j = 1, . . . ,N, is called the Gram matrix (or kernel matrix) of k with respect to 
xi, . . . ,xjsf. A Hermitian matrix K = (Kij) N satisfying 

N,N 

c H -K-c= c* Cj K id >0, 
i=i,j=l 

for all a G F, i = 1, . . . , N, where the notation * denotes the conjugate element, is called Positive Definite. 
In matrix analysis literature, this is the definition of a positive semidefinite matrix. However, since this is a 
rather cumbersome term and the distinction between positive definite and positive semidefinite matrices is 
not important in this paper, we employ the term positive definite in the way presented here. Furthermore, 
the term positive definite was introduced for the first time by Mercer in the kernel context (see [26]). Let 
X be a nonempty set. Then a function k : X x X — > F, which for all N G N and all x\, . . . , xn G X 
gives rise to a positive definite Gram matrix K, is called a Positive Definite Kernel. In the following, we 
will frequently refer to a positive definite kernel simply as kernel. 

Next, consider a linear class % of complex valued functions / defined on a set X. Suppose further, 
that in % we can define an inner product {-,-)u with corresponding norm || • ||% and that % is complete 
with respect to that norm, i.e., % is a Hilbert space. We call % a Reproducing Kernel Hilbert Space 
(RKHS), if for all y G X the evaluation functional T y : % — > F : T y (f) = f(y) is a linear continuous 
(or, equivalently, bounded) operator. If this is true, then by the Riesz's representation theorem, for all 
!/£l there is a function g y G Ti such that T y (f) = f(y) = (f,g y )n- The function k : X x X — > F : 

'The term (Kij) N denotes a square N x N matrix. 
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n(x,y) = g y (x) is called a reproducing kernel of %. It can be easily proved that the function k is a 
positive definite kernel. 

Alternatively, we can define a RKHS as a Hilbert space H for which there exists a function k : 
IxI-jF with the following two important properties: 

1) For every x G X, k(-,x) belongs to %. 

2) n has the so called reproducing property, i.e., 

f{x) = (/,«(•, x)) H , for all fen, (1) 

in particular n(x,y) = (k(-,2/),k(-,x))«. 
It has been shown (see ETTD that to every positive definite kernel k there corresponds one and only 
one class of functions % with a uniquely determined inner product in it, forming a Hilbert space and 
admitting k as a reproducing kernel. In fact, the kernel k produces the entire space %, i.e., % = 



span{fi(x, -)|x G X]q The map : X — > % : 3>(x) = k(-,x) is called the feature map of %. Recall, 
that in the case of complex Hilbert spaces (i.e., F = C) the inner product is sesqui-linear (i.e., linear in 
one argument and antilinear in the other) and Hermitian: 

(af + bg, h) H = a(f, h) H + b(g, h) n , 
(f,ag + bh) n = a*(f,g) n + b*(f,h) n , 

(f,g)n = (9,f)u, 

for all f,g,h G %, and a,b G C. In the real case, the condition K(x,y) = {«(■, y), k(-, x))u may 
be replaced by n(x,y) = (k(-, x), k(-, y))u- However, since in the complex case the inner product is 
Hermitian, the aforementioned condition is equivalent to k(x, y) = ((/«(•, x), k(-, y))n)*- O ne °f tne most 
important properties of RKHSs is that norm convergence implies pointwise convergence. More precisely, 
let {/ n }nGN C % be a sequence such that lim n \\f n — f\\ = 0, for some / G %. Then, the continuity of 
T x gives lim n f n (x) = lim n T x (f n ) = T x (f) = f(x), for all x G X. 

Although, the underlying theory has been developed by the mathematicians for general complex 
reproducing kernels and their associated RKHSs, only the real kernels have been considered by the 
machine learning community. One of the most widely used kernel is the Gaussian RBF, i.e., 

^-(«, y ) : = exp L lL^y?n t (2) 



2 The overbar denotes the closure of the set. 
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defined for x, y E M d , where a is a free positive parameter. Another popular kernel is the polynomial 
kernel: Kd(x,y) := (l + x T y) d , for d E N. Many more can be found in the related literature lP71-ll9l. 

Complex reproducing kernels, that have been extensively studied by the mathematicians, are, among 
others, the Szego kernels, i.e, k(z,w) = i_ l w , z , for Hardy spaces on the unit disk, and the Bergman 
kernels, i.e., k(z,w) = h_^, z \2 , for Bergman spaces on the unit disk, where \z\, \w\ < 1 ll25l . In the 
following, we discuss another complex kernel that has remained relatively unknown in the Machine 
Learning and Signal Processing societies. 

B. The Complex Gaussian Kernel 

Consider the complex valued function 

K a , c «{z,w) := exp ^_ £ti(^-<) 2 j ; (3) 

defined on C d x C d , where z,w E C d , Zi denotes the i-th component of the complex vector z E C d 
and exp is the extended exponential function in the complex domain. It can be shown that K a c d * s a 
complex valued kernel, which we call the complex Gaussian kernel with parameter a. Its restriction 
Kj a := (Ko-,c d )| R d xR d is me wei l known real Gaussian kernel. An explicit description of the RKHSs of 
these kernels, together with some important properties can be found in |[28l . 

III. Kernel Least Mean Square Algorithm 

In a typical LMS filter the goal is to learn a linear input-output mapping / : X — > K : f(x) = w x, 
X C W, based on a sequence of examples (x(l), d(l)), (x(2), rf(2)), . . . , (x(N), d(N)), so that to 
minimize the mean square error, E [\d(n) — w T x(n)\ 2 ] . To this end, the gradient descent rationale 
is employed and at each time instant, n = 1,2,..., N, the gradient of the mean square error, i.e., 
—2E[e(n)x(n)], is estimated via its current measurement, i.e., E[e(n)x(n)] = e(n)x(n), where e(n) = 
d(n) — w(n — l) T x(n) is the a-priori error at instance n = 2, . . . , N. It takes a few lines of elementary 
algebra to deduce that the update of the unknown vector parameter is: w(n) = w(n — 1) + fjie(n)x(n), 
where \i is the parameter controlling the step update. If we take the initial value of w as w(0) = 0, then 
the repeated application of the update equation yields: 

n 

w(n) = fj,y^e(k)x(k) (4) 

k=l 

Hence, for the filter output at instance n we have: 

n-l 

d(n) = w(n — l) T x(n) = e(k)x(k) T x(n), (5) 

k=i 
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for n = 1, 2, . . . , N. Equation ((5]) is expressed in terms of inner products only, hence it allows for the 
application of the kernel trick. Thus, the filter output of the KLMS at instance n is 

n-l 

d{n) = (x(n),w(n — = fj, e(k)n (x(n),x(k)) , (6) 

k=l 

n 

while w(n) = fi e(k)K(-, x(k)), (7) 

k=l 

for n = 1,2, ... ,7V. 

Another, more formal, way of developing the KLMS is the following. First, we transform the input space 
X to a high dimensional feature space 7~L, through the (implicit) mapping $ : X — > H, &(x) = k(-,x). 
Thus, the training examples become (&(x(l)),d(l)), ... , (&(x(N)),d(N)). We apply the LMS proce- 
dure on the transformed data, with the linear filter output d(n) = (Q(x(n)), w)u. The model (Q(x), w}^ 
is more representative than the simple w T x, since it includes the nonlinear modeling through the presence 
of the kernel. The objective now becomes to minimize the cost function E [\d(n) — (Q(x(n)), w}u\ 2 ~\ 
(see [29]). Using the notion of the Frechet derivative l|29l - |[3Ti . which has to be mobilized, since the 
dimensionality of the RKHS may be infinite, we are able to derive the gradient of the aforementioned 
cost function with respect to w, if we estimate it by its current measurement \d(n) — {&(x(n)),w}\ 2 . 
Thus the respective gradient is — 2e(n)<3?(a;(n)). It has to be emphasized, that now w is not a vector, but 
a function, i.e., a point in the linear Hilbert space. It turns out that the update of the KLMS is given by 
w(n) = w(n — 1) + ne{n)<&(x{n)), where e(n) = d(n) — d{n). From this update, following the same 
procedure as in LMS and applying the reproducing property, we obtain equations © and ©, which are 
at the core of the KLMS algorithm. More details and the algorithmic implementation may be found in 

mi. 

Note that in a number of attempts to kernelize known algorithms, that are cast in inner products, the 
kernel trick is, usually, used in a "black box" rationale, without consideration of the problem in the RKH 
space, in which the (implicit) processing is carried out. Such an approach, often, does not allow for a 
deeper understanding of the problem, especially if a further theoretical analysis is required. Moreover, in 
our case, such a "blind" application of the kernel trick on a standard complex LMS form, can only lead 
to spaces defined by complex kernels, as it will become clear soon. Complex RKH spaces, that are built 
around complexification of real kernels, do not result as a direct application of the standard kernel trick. 
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IV. COMPLEXIFICATION OF REAL REPRODUCING KERNEL HlLBERT SPACES 

To generalize the kernel adaptive filtering algorithms on complex domains, we need a universal 
framework regarding complex RKHSs. A first straightforward approach is to use directly a complex 
RKHS, using one of the complex kernels given in section JT] In this section, we present an alternative 
simple technique called complexification of real RKHSs, which has the advantage of allowing modeling 
in complex RKHSs using popular well-established and well understood, from a performance point of 
view, real kernels (e.g., gaussian, polynomial, e.tc). 

Let X C W . Define X 2 = X x X C R 2u and X = {x + iy, x, y € X} C C u equipped with a complex 
product structure. Let % be a real RKHS associated with a real kernel k defined on X 2 x X 2 and let 
(-,-)h be its corresponding inner product. Then, every / S H can be regarded as a function defined on 
either X 2 or X, i.e., f(z) = f(x + iy) = f(x,y). 

Next, we define H 2 = 7i x %. It is easy to verify that Ti 2 is also a Hilbert Space with inner product 

(f,g)w = (h,9i)n + (f2,92)n, (8) 

for / = (/i,/2), g = (51,52)- Our objective is to enrich T-L 2 with a complex structure. We address this 
problem using the complexification of the real RKHS T~L. To this end, we define the space H = {/ = 
/1 + if 2] /i) /2 £ T~L} equipped with the complex inner product: 

(f,g)m = (fi,9i)u + (f2,g2)H + i((f2,gi)n - (h,92)n) , 

for f = fi + i/2, g = 9i + ig2- Hence, f,g : X C C u — > C. It is not difficult to verify that EI is a 
complex RKHS with kernel k [25]. We call H the complexification of %. It can readily be seen, that, 
although H is a complex RKHS, its respective kernel is real (i.e., its imaginary part is equal to zero). 

To complete the presentation of the required framework for working on complex RKHSs using this 
rationale, we need a technique to implicitly map the samples data from the complex input space to the 
complexified RKHS H. This can be done using the simple rule: 

= $(x + iy) = $(x, y) = $(£c, y) + i$(a;, y), (9) 

where is the feature map of the real reproducing kernel k, i.e., Q(x,y) = «;(-, (x, y)). It must be 
emphasized, that $ is not the feature map associated with the complex RKHS EL Furthermore, the 
employed kernel is a real one. Therefore, the algorithms derived using this approach cannot be reproduced, 
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if one blindly applies the kernel trick using any complex kernel. However, observe that: 

(*(*), *(V))h = 2($(x, y), y')) H 
= 2K((x,y),(x',y')). 

This relation implies that the complexification procedure is equivalent with the following complexified 
real kernel trick: Given an algorithm, which is formulated in terms of complex dot products (i.e, w H z, 
where z = x + iy, w = wi + 211*2), one can construct an alternative algorithm by replacing each one of 
the complex dot products with a positive definite real kernel k, with arguments the extended real vectors 
of z and w (i.e., k((x, y), (w 2 , w 2 ))). 

V. Wirtinger's Calculus on C 

Wirtinger's calculus |[T5l is enjoying increasing popularity in the signal processing community mainly 
in the context of complex adaptive filtering [ 16"]— [23"], as a means to compute, in an elegant way, gradients 
of real valued cost functions that are defined on complex domains (C u ). The Cauchy-Riemann conditions 
dictate that such functions are not holomorphic (except from the case where the function is a constant) 
and therefore the complex derivative cannot be used. Instead, if we consider that the cost function is 
defined on a Euclidean domain with a double dimensionality (K 2 ^), then the real derivatives may be 
employed. The price of this approach is that the computations may become cumbersome and tedious. 
Wirtinger's calculus provides an alternative equivalent formulation, that is based on simple rules and 
principles and which bears a great resemblance to the rules of the standard complex derivative. In this 
section, we present the main notions of Wirtinger's calculus for functions defined on complex domains. 
These ideas are, subsequently, extended in section |VT] to include the case of general complex Hilbert 
spaces. 

Let / : C — > C be a complex function defined on C. Obviously, such a function may be regarded 
as either defined on M 2 or C (i.e., f(z) = f(x + iy) = f(x,y)). Furthermore, it may be regarded as 
either a complex valued function, f(x,y) = u(x,y) + iv(x,y) or as a vector valued function f(x,y) = 
(u(x,y),v(x,y)). We will say that / is differ entiable in the real sense if u and v are differentiable. It 
turns out that, when the complex structure is considered, the real derivatives may be described using 
an equivalent and more elegant formulation, which bears a surprising resemblance with the complex 
derivative. In fact, if the function / is differentiable in the complex sense (i.e. the complex derivative 
exists), the developed derivatives coincide with the complex ones. Although this methodology is known 
for some time in the German speaking countries and it has been applied to practical applications 1321 , 
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ll33l . only recently has attracted the attention of the signal processing community, mostly in the context 
of works that followed Picinbono's paper on widely linear estimation filters ifTol . 

The Wirtinger's derivative (or W-derivative for short) of / at a point c is defined as follows 

S«> ^ \ - - \ (s<«> + » + \ (£<«> - » ■ <10) 

The conjugate Wirtinger's derivative (or CW-derivative for short) of / at c is defined by: 

= 5 (£<°> + f w) ^ i (£ w " + 1 2 (£ (c) + » • <»> 

The following properties can be proved 11211 . 041 . 051 : 

1) If / has a Taylor series expansion with respect to z (i.e., it is holomorphic) around c, then Jpir(c) = 
0. 

2) If / has a Taylor series expansion with respect to z* around c, then §|(c) = 0. 

3) f£(c))* = S£(c). 

4) (^(c))* = &(c). 

5) Linearity: If /, g are differentiable in the real sense at c and a, /3 G C, then 

d(af + (3g) df dg d(af + (3g) d f<\<o d 9 ( , 

dz (C) = °^ (C) + ^ (C) ' ^ (C) = a 5? (c) + ^ (C) 

6) Product Rule: If /, g are differentiable in the real sense at c, then 

9(f '-g) f , df \ d 9( \ d{f -g) df dg 

-^-W = feWffW + f^Yz^ ~^ {c) = d^ {c)9ic) + /(c fe (c) - 

7) Division Rule: If /, g are differentiable in the real sense at c and g(c) ^ 0, then 

dz { ' g 2 (c) ' 9z* 1 J </ 2 (c) 

8) Chain Rule: If / is differentiable in the real sense at c and g is differentiable in the real sense at 
/(c), then 

^W=|c/(<(c) + ^(/(c))^(c), 

^w-|c/(c))^w + ^(/W)^(c). 

In view of the aforementioned properties, one might easily compute the W and CW derivatives of any 
complex function /, which is written in terms of z and z* , following the following simple tricks: 



To compute the W-derivative of a function f, which is expressed in terms of z and z* , 
apply the usual differentiation rules considering z* as a constant. 
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• To compute the CW- derivative of a function f, which is expressed in terms of z and z*, 
apply the usual differentiation rules considering z as a constant. 
Note that any complex function f(z), which is differentiable in the real sense, can be cast in terms of 
z and z* . For example, if the function f(z) = f{x + iy) = f(x, y) is given in terms of x and y, replacing 
x by (z + z*)/2 and y by (z — z*)/2 gives the result. It should be emphasized, that these statements must 
be regarded as a simple computational trick rather than as a rigorous mathematical rule. This trick works 
well due to the aforementioned properties. Nonetheless, special care should be considered whenever these 
tricks are applied. For example, given the function f(z) = \z\ 2 , we might conclude that Jp- = 0, since 
if we consider z as a constant, then f{z) is also a constant. However, one might argue that since there 
isn't any rule regarding the complex norm, this rationale leads to an error. Undeniably, if one recasts 
/ as f(z) = zz* , then one concludes that J^- = z and ^ = z*. Similar rules and principles hold for 
functions defined on C IT341 . 

VI. Extension of Wirtinger's Calculus to general Hilbert spaces 

To apply minimization algorithms on real valued operators defined on complex RKHSs, we need to 
compute the associated gradients. To this end, in this section, we generalize the main ideas and results 
of Wirtinger's calculus on general Hilbert spaces. We begin with a brief review of the Frechet derivative, 
which generalizes differentiability to Hilbert spaces and which will be the basis for our discussion. 

A. Frechet Derivatives 

Since Frechet differentiability is not the mainstream of mathematical tools used in the Signal Processing 
and Machine Learning communities, we give here some basic definitions for the sake of clarity. Consider 
a Hilbert space H over the field F (typically M or C). The operator T : H — > F u is said to be Frechet 
differentiable at /q, if there exists a linear continuous operator W = (Wi, W2, • • • , W„) : H — > ¥ u such 
that 

r \\T(f + h)-T(f )-W(h)\\ F „ 
[Wk-«> \\h\\ H 

where || • \\h = \J (•, -)h is the induced norm of the corresponding Hilbert Space. Note that F v is 

considered as a Banach space under the Euclidean norm. The linear operator W is called the Frechet 

derivative and is usually denoted by dT(fo) : H — > F v ' . Observe that this definition is valid not only for 

Hilbert spaces, but for general Banach spaces too. However, since we are mainly interested at Hilbert 

spaces, we present the main ideas in this context. It can be proved that if such a linear continuous operator 
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W can be found, then it is unique (i.e., the derivative is unique) |30|. In the special case where v = 1 

(i.e., the operator T takes values on F) using the Riesz's representation theorem, we may replace the 

linear continuous operator W with an inner product. Therefore, the operator T : H — >• F is said to be 

Frechet differentiable at /o, iff there exists a w G H, such that 

r T(f + h)-T(f )-(h,w) H 
INU^o \\h\\ H 



where (-,-)h is the dot product of the Hilbert space H and || • \\h is the induced norm. The element w* 
is usually called the gradient of T at / and it is denoted by w* = VT(/ ). 

For a general vector valued operator T = (T±, . . . , T v ) : H — > F v , we may easily derive that if T is 
differentiable at /o, then T L is differentiable at /o, for all l = 1, 2, . . . , v, and that 



/ (h,VT 1 (f )*) H \ 



dT{h){h) 



(14) 



\(h,VT u (foT)Hj 

To prove this claim, consider that since T is differentiable, there exists a continuous linear operator W 
such that 

r \\T(f + h)-T(f )-W(h)\\ Fv 

lim l^^ + ^-^^-^WI^ _n 

INIwO ' ^ 

for all t = 1, . . . , v. Thus, 



Km fw°+")-y°)-^W| =0 , 



for all i = l,2,i/. The Riesz's representation theorem dictates that since W L is a continuous linear 
operator, there exists w L G such that W t (/i) = (/i, w l )h, for all i = 1, . . . , v. This proves that T L is 
differentiable at /o and that w* = VT t (/o), thus equation (fT4l) holds. The converse is proved similarly. 

The notion of Frechet differentiability may be extended to include also partial derivatives. Consider 
the operator T : — > F defined on the Hilbert space with corresponding inner product: 

where / = (f 1 , f 2 , . . . / M ), g = {91,92,- • • is said to be Frechet differentiable at / with 

respect to / t , iff there exists a w £ H, such that 

um n/. + [*U-r(/.)-(W,..»>« =0t (15) 

\\h\\H->Q \\h\\H 
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where [h] L = (0, 0, . . . , 0, h, 0, . . . , 0), is the element of with zero entries everywhere, except at place 
t. The element w* is called the gradient of T at f with respect to f L and it is denoted by w* = V t T(/ ). 
The Frechet partial derivative at / with respect to f t is denoted by §jJ(/o)» fjJ(/o)(^) = (Mh^)h- 
Although it will not be used here, it is interesting to note, that it is also possible to define Frechet 
derivatives of higher order and a corresponding Taylor's series expansion. In this context, the n-th Frechet 
derivative of T at / , denoted as d n T(f ), is a multilinea]^ map. If T has Frechet derivatives of any 
order, it can be expanded as a Taylor series |[36l . i.e., 



T(f + h) = ^ ^d n T(f )(h, h,...,h). (16) 

n=0 

In relative literature the term d n T(c)(h, h,...,h) is often replaced by d n T(c) ■ h n , which it denotes 
that the multilinear map d n T(c) is applied to (h, h, ... , h). 

B. Complex Hilbert spaces 

Let % be a real Hilbert space with inner product (-, -)u and T~L 2 , H the Hilbert spaces defined as shown 
in section |IV] In the following, the complex structure of H will be used to derive derivatives similar to 
the ones obtained from Wirtinger's calculus on C. 

Consider the function T : A C H -> C, T(f) = T(u f + iv f ) = T r (uf,Vf) + iTi(uf,Vf) t defined on 
an open subset A of H, where Uf,Vf G % and T r , T{ are real valued functions defined on T~L 2 . Any such 
function, T, may be regarded as defined either on a subset of H, or on a subset of T~L 2 . Moreover, T 
may be regarded either as a complex valued function, or as a vector valued function, which takes values 
in M 2 . Therefore, we may equivalently write: 

T{f) = T{u f + iv f ) = T r (u f ,v f ) + iTi(u f ,v f ), 
T(f) = (T r ( Uf ,v f ),T l (u f ,v f )). 

In the following, we will often change the notation according to the specific problem and consider any 

element of / G H defined either as f = Uf + ivf el, or as / = (uf,Vf) G Ti 2 . In a similar manner, 

any complex number may be regarded as either an element of C, or as an element of M 2 . We say that 

T is Frechet complex differentiable at c e H, if there exists w G H such that: 

T(c + h) -T(c) - (h,w) M 
lim v ' \ ' x ' ' = 0. (17) 

||h||n-»o \\h 



3 A function is called multilinear, if it is linear in each variable. 
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Then w* is called the complex gradient of T at c and it is denoted as w* = VT(c), The Frechet 
complex derivative of T at c is denoted as dT(c)(h) = (h,w)u- This definition, although similar with 
the typical Frechet derivative, exploits the complex structure of EL More specifically, the complex inner 
product, that appears in the definition, forces a great deal of structure on T. Similarly to the case of 
ordinary complex functions, it is this simple fact that gives birth to all the important strong properties 
of the complex derivative. For example, it can be proved, that if dT{c) exists, then so does d n T(c), for 
n G N. If T is differentiable at any c G A, T is called Frechet holomorphic in A, or Frechet complex 
analytic in A, in the sense that it can be expanded as a Taylor series, i.e., 

T(c + h) = y2-d n T(c)(h,h,...,h). (18) 

n=0 

The proof of this statement is out of the scope of this paper. The interested reader may dig deeper on this 
subject by referring to [36 1- We begin our study by exploring the relations between the complex Frechet 
derivative and the real Frechet derivatives. In the following, we will say that T is Frechet differentiable 
in the complex sense, if the complex derivative exists, and that T is Frechet differentiable in the real 
sense, if its real Frechet derivative exists (i.e., T is regarded as a vector valued operator T : H 2 — > V 2 ). 
Similarly, the expression "T is Frechet complex analytic at c" means that T is Frechet complex analytic 
at a neighborhood around c. We will say that T is Frechet real analytic, when both T r and T\ have a 
Taylor's series expansion in the real sense. 

Proposition VI.l. Let T : A C H — >■ C be an operator such that T(f) = T(uf + ivf) = T(uf,Vf) = 
T r (uf,Vf) +iTi(iif,Vf). If the Frechet complex derivative ofT at a point c G A (i.e., dT(c) : M — > C) 
exists, then T r and Ti are differentiable at the point c = (cx, c\) = c\+ic2, where c\,c% G %. Furthermore, 

V u T r (c 1 ,c 2 ) = V v Ti(a,c 2 ), V v T r (a,c 2 ) = -V u Tj(ci,e 2 ). (19) 

Equations ( fT9l are the Cauchy Riemann conditions with respect to the Frechet notion of differentiability. 
Similar to the simple case of complex valued functions, they provide a necessary and sufficient condition, 
for a complex operator T, that is defined on H, to be differentiable in the complex sense, provided that 
T is differentiable in the real sense. This is explored in the following proposition. 

Proposition VI.2. If the operator T : A C H -> C, T(f) = T r (f) + iT^f), where f = u f + iv f , 
is Frechet differentiable in the real sense at a point (ci,cq) G Ti 2 and the Frechet Cauchy -Riemann 
conditions hold: 

V u T r (a,c 2 ) =V v T i (c 1 ,c 2 ), S/ v T r (a,c 2 ) = -V u Ti(a,c 2 ), (20) 
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then T is differentiable in the complex sense at the point c = (ci, c 2 ) = c\ + c 2 i G EL 

Proof: see Appendix |A] ■ 
If the Frechet Cauchy Riemann conditions are not satisfied for an operator T, then the Frechet complex 
derivative does not exist and the function cannot be expressed in terms of h, as in the case of Frechet 
complex differentiable functions (see equation [T8l). Nevertheless, if T is Frechet differentiable in the real 
sense (i.e., T r and Tj are Frechet differentiable), we may still find a form of Taylor's series expansion 
by utilizing the extension of Wirtinger's calculus. It can be shown (see the proof of proposition IVI.2I in 
Appendix lAl. that: 

T(c + h) =T(c) + i (h, (V„T(c) - iV v T(c))*) u (21) 
+ \{h\ (V„T(c) + iV„T(c))*) H + o(||/»|| H ). 

One may notice that in this case the associated Taylor's expansion is casted in terms of both h and h*. 
This can be generalized for higher order Taylor's expansion formulas by following the same rationale. 
Observe also that, if T is Frechet complex differentiable, this relation degenerates (due to the Cauchy 
Riemann conditions) to the respective Taylor's expansion formula (i.e., (fT8l». In this context, the following 
definitions come naturally. 

We define the Frechet Wirtinger's gradient (or W -gradient for short) of T at c as 

V/T(c) =i (ViT(c) - iV 2 T(c)) = X - (V u T r (c) + V^c)) (22) 

+ ^(V u T i (c)-V v T r (c)), 

and the Frechet Wirtinger's derivative (or W -derivative) as fj(c) : HI — > C, such that §j(c)(h) = 
(h, V/T(c)*)h. Consequently, the Frechet conjugate Wirtinger's gradient (or CW-gradient for short) 
and the Frechet conjugate Wirtinger's derivative (or CW-Jer/vaf/ve) of T at c are defined by: 

V r T(c) =1 (ViT(c) + ^V 2 T(c)) = i (V u T r (c) - V^T^c)) (23) 

+ i -(V u T i {c)+V v T r (c)), 

and ^(c) : H -> C, such that §fr(c)(h) = (h, (V/*T(c))*) H . Note, that both the W-derivative and 
the CW-derivative exist, if T is Frechet differentiable in the real sense. In view of these new definitions, 
equation (|2TT > may now be recasted as follows 

T(c + fc) =T(c) + (h, (V f T(c))X + (ft*, (V r T(c))X + o(||/i|| H ). (24) 
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From these definitions, several properties can be derived: 

1) If T(f) is /-holomorphic at c (i.e., it has a Taylor series expansion with respect to / at c), then 
its Frechet W-derivative at c degenerates to the standard Frechet complex derivative and its Frechet 
CW-derivative vanishes, i.e., V/*T(c) = 0. 

2) If T(f) is /* -holomorphic at c (i.e., it has a Taylor series expansion with respect to /* at c), then 
V/T(c) = 0. 

3) The first order Taylor expansion around / G H is given by 

T(f + h) =T(f) + (h, (V/T(/))*) H + (h*, (V r T(/))*) H . 

4) If T(f) = {f,w) n , then V/T(c) = to*, V r T(c) = 0, for every c. 

5) If T(/) = (/*,«j) h , then V f T(c) = 0, V r T(c) = w*, for every c. 

6) Linearity: If T, S : H — > C are Frechet differentiable in the real sense at c G HI and a, [3 G C, then 

V/(aT + pS)(c) = aVjT(c) + f3V f S{c) 
V r («T + PS) (c) = aV r T(c) + /3V 5(c) . 

A complete list of the derived properties, together with the proofs of the most important ones, are given 
in Appendix iBl 

An important consequence of the previous properties is that if T is a real valued operator defined on 
H, then (V/T(c))* = V pT{c), and its first order Taylor's expansion is given by: 

T(f + h)= T(f) + (h, (V/T(/))*) H + (h*, (V r T(/))*) H 

= T(f) + (h, V r T(/)) H + V r T(f)) m )* = T(f) + 2-51? [(fc, V r T(/)) H ] . 
However, in view of the Cauchy Riemann inequality we have: 

R[(fr,V r T(/)) H ] < |(/i,V r T(/)) H | < ||/i|| H - ||V/.r(/)|| H . 

The equality in the above relationship holds if h ff V/*T (where the notation ff denotes that ^ and 
V/*T have the same direction, i.e., there is a A > 0, such that /i = AV /*T). Hence, the direction of 
increase of T is V pT(f). Therefore, any gradient descent based algorithm minimizing T(f) is based 
on the update scheme: 

/n = /«-i-M-V r T(/ n ^ 1 ). (25) 

Assuming differentiability of T, a standard result from Frechet real calculus states that a necessary 
condition for a point c to be an optimum (in the sense that T(f) is minimized or maximized) is that 
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this point is a stationary point of T, i.e., the Frechet partial derivatives of T at c vanish. In the context 
of Wirtinger's calculus, we have the following obvious corresponding result. 

Proposition VI.3. If the function T : A C H — > R is Frechet differentiable at c in the real sense, 
then a necessary condition for a point c to be a local optimum (in the sense that T(c) is minimized or 
maximized) is that either the Frechet W, or the Frechet CW derivative vanishes. 

Proof: Observe that if T is real valued, the Wirtinger derivatives take the form V/T(c) = ^(V u T(c)— 
iV v T(c)) and V r T(c) = ±(V n T(c) + iV v T(c)). If c is a local optimum of T then V u T(c) = 
V v T(c) = and thus V/T(c) = V/*T(c) = 0. Note, that for real valued functions the W and the 
CW derivatives constitute a conjugate pair. Thus, if the W derivative vanishes, then the CW derivative 
vanishes too. The converse is also true. This completes the proof. ■ 

VII. Complex Kernel Least Mean Squares - CKLMS 

In order to illustrate how the proposed framework may be applied to problems of complex signal 
processing, we present two realizations of the Kernel Least Mean Squares (KLMS) algorithm for complex 
data. The first scheme (CKLMS 1) employs the complexification of real reproducing kernels (see section 
HVl) . while the second one uses pure complex kernels (CKLMS2). Wirtinger's calculus is exploited in 
both cases to compute the necessary gradient updates. 

A. Complex KLMS via complexification of real kernels - CKLMS1 

Consider the sequence of examples (z(l),d(l)), (z(2), d{2)), (z(N),d(N)), where d(n) € C, 
z(n) G V C C, z(n) = x(n) + iy(n), x(n),y(n) E M?, for n = 1, . . . , N. Consider, also, a real 
reproducing kernel k defined on X x X, X C M 2 ", and let % be the corresponding RKHS. We map the 
points z(n) to the RKHS EI (H is constructed as explained in section ITVl) using the mapping 

*(z(n)) = *(z(n)) + i$(z(n)) = k (•, (x(n), y{n))) + % ■ k (•, (x(n),y(n))) , 

for n = 1, . . . , N, where $ is the feature map of %. The objective of the complex Kernel LMS is to 
design a filter, w, with desired response d(n) = ($>(z(n)),w)u, so that to minimize E [C n (w)}, where 

C n (w) = \e(n)\ 2 = \d(n) - (&(z(n)),w) u \ 2 

= (d(n) - ($(z(n)),w) u ) (d(n) - (*(z(n)), w) M )* 
= (d(n) - (w*, **(z(n))) H ) (d(n)* - (w, &{z{n))) n ) , 
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at each instance n. We then apply the complex LMS to the transformed data, estimating the mean square 
error by its current measurement E \£ n (w)] = C n (w), using the rules of Wirtinger's calculus to compute 
the CW gradient, i.e., V w *£ n (w) = —e(n)* • Therefore the CKLMS1 update rule becomes: 

w(n) = w(n — 1) + /ue(n)* • (26) 

where w(n) denotes the estimate at iteration n. 

Assuming that w(0) = 0, the repeated application of the weight-update equation gives: 

w(n) =w(n — 1) + fj,e(n)*^(z(n)) = w(n — 2) + fie(n — l)*<&(z(n — 1)) + [ie(n)*&(z(n)) 

n 

=/x (27) 

k=l 

Thus, the filter output at iteration n becomes: 

n-1 

d(n) =(®(z(n)),w(n - 1)) H = ^ e(fc)(*(*(n)), *(z(fc))) H 

k=l 

n—1 n—1 

=2fi R[e(fc)]K(z(n), z(fc)) + 2fi ■ i ^ 9f[e(ib)]/6(z(n), z(k)), (28) 

k=l k=l 

where the evaluation of the kernel is done by replacing the complex vectors z(n), of C u with the 
corresponding real vectors of IR 2iy , i.e., z(n) = (x(n),y(n)). 

It can readily be shown that, since the CKLMS1 is the complex LMS in RKHS, the important properties 
of the LMS (convergence in the mean, misadjustment, e.t.c.) carry over to CKLMS1. Furthermore, we 
may also define a normalized version, which we call Normalized Complex Kernel LMS (NCKLMS1). 
The weight-update of the NCKLMS1 is given by: 

w(n) =w(n - 1) + ^^ e^n)) 

The NCKLMS 1 algorithm is summarized in Algorithm [TJ We should emphasize that this formulation of 
the complex KLMS cannot be derived following the usual "black box" rationale of the kernel trick, as 
it has already been pointed out in section llV] The complexified real kernel trick can be used instead. 

One might think, that modeling the desired response as d(n) = (w(n — 1), <&(z(n)))$i, provides an 
alternative formulation for the CKLMS1 algorithm. In this case, the CW gradient of the instantaneous 
square error is given by V w *C n (w) = — e(n)$(z(n)). Following the same procedure, we conclude that 
the update rule becomes: w(n) = w(n — 1) + ^e(n) • <&(z(n)), and assuming that w(0) = 0, one 
concludes that: 



w(n) e(fc)<fr(z(fc)). 



k=i 
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Algorithm 1 Normalized Complex Kernel LMS with complexification of real kernels (NCKLMS1) 
INPUT: (z(l),d(l)), (z(N),d(N)) 

OUTPUT: The expansion 

w = EfcLi a(k)K(; z{k)) + i ■ J2k=i b(k)n(-,z(k)). 

Initialization: Set a = {},& = {}, Z = {} (i.e., w = 0). Select the step parameter p, and the 
kernel k. 

for n = 1 : N do 

Compute the filter output: 

n— 1 n—1 

d( n ) = 5^(o(Jb) + 6(Jfe)) • «(z(n), «(fc)) + i 5^(o(fc) - 6(fc)) • «(z(n), z(fc)). 

k=l k=l 

Compute the error: e(n) = d{n) — d(n). 
7 = 2K(z(n),z(n)). 
o(n) = /i(»[e(n)]+9f[e(n)])/7. 
6(n) = M(«[e(n)]-9[e(n)])/7. 

Add the new center z(n) to the list of centers, i.e., add z(n) to the hst Z, add a{n) to the list a, 
add b(n) to the list b. 
end for 



However, although this relation is different to equation (|27T ). the filter output at iteration n, for this filter, 
turns out to be exactly the same as before: 

n-l 

e?(n) =(^(n - 1), *(z(n))) H = /U J] e(fc)(*(z(A;)), *(z(n))) H , 

fe=l 

which is in line with what we know for the standard complex LMS. 

B. Complex KLMS with pure complex kernels - CKLMS2 

As, in section IVII-AI consider the sequence of examples (z(l),d(l)), (z(2), <i(2)), (z(N), d(N)), 
where d(n) G C, z(n) G V C C, z(n) = x(n) + iy(n), x(n),y(n) £ R u , for n = 1, . . . , N. Consider 
also a complex reproducing kernel k defined on X x X, X C C u and the respective complex RKHS 
M. Each element / € H may be cast in the form / = Uf + ivf, Uf,Vf G ^, where % is a real 
Hilbert space. We map the points z(n) to the complex RKHS H using the feature map 4? : X — > H : 
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<l?(z) = (-,«(-, for n = 1, . . . , N. Estimating the filter output by d(n) = (<&(z(n)), w)j$, the 

objective of the complex Kernel LMS is to minimize E[C n (w)], at each instance n. Once more, we 
apply the complex LMS to the transformed data, using the rules of Wirtinger's calculus to compute the 
gradient of C n (w), i.e., V w *£ n (w) = — e(n)* • «I>(z(n)). Therefore, the CKLMS2 update rule becomes 
w(n) = w(n — 1) + fie(n)* ■ 3>(z(n)), as expected, where w(n) denotes the estimate at iteration n. 
Assuming that w(0) = 0, the repeated application of the weight-update equation gives: 

n 

w{n) = e(k)*®(z(k)). (29) 

k=l 

Thus, the filter output at iteration n becomes: 

n—1 n—1 

d(n) =(l>(z(n)),tw(n - 1)) H = e(fc)(*(z(n)), <&(z(fc))) H = e(k)K(z(k), z(n)). 

k=l k=l 

We should note, that the CKLMS2 algorithm may be equivalently derived, if one blindly applies the 
kernel trick on the complex LMS. However, such an approach conceals the mathematical framework 
that lies underneath, which is needed if one seeks a deeper understanding of the problem. The repeated 
application of the update equation of the CLMS yields: 

n 

w(n) = Y j e(k)*z(k), 

k=l 

while the filter output at iteration n is given by: 

n-1 

d(n) = nY^e{k)z{n) H 'z(k), 

k=l 

where the notation • denotes the Hermitian matrix. It is evident that the application of the kernel trick 
on these equations yields the same results. 

Furthermore, note that, using the complex gaussian kernel, the algorithm is automatically normalized. 
The CKLMS2 algorithm is summarized in Algorithmic 

Another formulation of the CKLMS2 algorithm may be derived if we estimate the filter output as 
d{n) = (w(n — 1), <fr(z(n)))]H[- Then the update rule becomes 

w{n) = w{n — 1) + [ie(n) ■ $(z(n)). 

Assuming that w(0) = 0, the repeated application of the weight-update equation gives: 

it 

ti;(n) = 5>(fc)*(*(A0), 

k=l 
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and the filter output at iteration n becomes: 

n-l 

d( n ) = A*5^ e(k)n(z(n), z{k)). (30) 
fc=i 

Note that the two formulations of the CKLMS2 are not identical, as it was the case for CKLMS. However, 
all the simulated experiments that we performed, using the complex gaussian kernel, exhibited similar 
performance (in terms of signal to noise ratio - SNR). 

Algorithm 2 Normalized Complex Kernel LMS2 (NCKLMS2) 
INPUT: (z(l),d(l)), (z(N),d(N)) 

OUTPUT: The expansion 

Initialization: Set a = {}, Z = {} (i.e., w = 0). Select the step parameter \i and the parameter a of 
the complex gaussian kernel, 
for n = 1 : N do 

Compute the filter output: 

n-l 

d(n) = y~]a(k) • K{z{k), z(n)). 

k=l 

Compute the error: e(n) = d(n) — d(n). 
7 = K(z(n),z(n)). 
a(n) = fiein)/^. 

Add the new center z(n) to the list of centers, i.e., add z{n) to the list Z, add a(n) to the list a. 
end for 



C. Sparsification 

The main drawback of any kernel based adaptive filtering algorithm is that a growing number of 
training points, z{n), is involved, as it is apparent from d27l ), ( f29T > in the case of complex KLMS. Hence, 
increasing memory and computational resources are needed, as time evolves. Several strategies have been 
proposed to cope with this problem and to come up with sparse solutions. In this paper, we employ the 
well known novelty criterion lfl4l . |37ll . In novelty criterion online sparsification, a dictionary of points, 
C, is formed and updated appropriately. Whenever a new data pair (3>(z n ), d n ) is considered, a decision 
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Fig. 1. The equalization task. 



is immediately made of whether to add the new center, <&(z(n)), to the dictionary of centers C. The 
decision is reached following two simple rules. First, the distance of the new center, 3>(z(n)), from the 
current dictionary is evaluated: dis = min Cfcg e{||<l>(;z(n)) — c^Hn}- If this distance is smaller than a 
given threshold 8\ (i.e., the new center is close to the existing dictionary), then the center is not added 
to C. Otherwise, we compute the prediction error e n = d n — d n . If \e n \ is smaller than a predefined 
threshold 82 , then the new center is discarded. Only if |e n | > 82 the new center <&(z(n)) is added to the 
dictionary. 

An alternative method has been considered in [4], which results in an exponential forgetting mechanism 
of past data. In 0, 1381 , the sliding window rationale has been considered. In all the implementations 
of CKLMS that are presented in this paper the novelty criterion was adopted. 

VIII. Experiments 

The performances of CKLMS 1 and CKLMS2 have been tested in the context of: a) a nonlinear channel 
equalization task (see figure [U and b) a nonlinear channel identification task. 

A. Channel Equalization 

For the first case, two nonlinear channels have been considered. The first channel (labeled as soft 
nonlinear channel in the figures) consists of a linear filter: 

t( n ) = (-0.9 + O.&O • s(n) + (0.6 - 0.7i) • s(n - 1) 
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and a memoryless nonlinearity 



q(n) = t(n) + (0.1 + 0.15?) • t 2 {n) + (0.06 + 0.05z) • i\n). 



The second one (labeled as strong nonlinear channel in the figures) is comprised by the same linear filter 
and the nonlinearity: 



These are standard models that have been extensively used in the literature for such tasks ID. At the 
receiver end of the channels, the signal is corrupted by white Gaussian noise and then observed as r(n). 
The level of the noise was set to 16dB. The input signal that was fed to the channels had the form 



where X(n) and Y{n) are gaussian random variables. This input is circular for p = \/2/2 and highly 
non-circular if p approaches or 1 |[T8l . Note that the issue of circularity is very important in complex 
adaptive filtering. Circularity is intimately related to rotation in the geometric sense. A complex random 
variable Z is called circular, if for any angle <p both Z and Ze^ (i.e., the rotation of Z by angle <p) follow 
the same probability distribution ifTTI . Loosely speaking, non circularity adds some form of nonlinearity 
to the signal. It can be proved that widely linear estimation (i.e., linear estimation in both z and z*) 
outperforms standard linear estimation for general (i.e., circular or non-circular) complex signals. For 
circular signals, the two models lead to identical results [16], ll39l . 

The aim of a channel equalization task is to construct an inverse filter, which acts on the output r(n) 
and reproduces the original input signal as close as possible. To this end, we apply the NCKLMS 1 and 
the NCKLMS2 algorithms to the set of samples 



where L > is the filter length and D the equalization time delay, which is present to, almost, any 
equalization set up. 

Experiments were conducted on a set of 5000 samples of the input signal d3lT) considering both the 
circular and the non-circular cases. The results are compared with the NCLMS and the WL-NCLMS (i.e., 
widely linear NCLMS) algorithms and with two adaptive nonlinear algorithms: a) the CNGD algorithm, 
which is thoroughly described in ifPTI and a Multi Layer Perceptron (MLP) with 50 nodes in the hidden 
layer (proposed in |[T8l ). In both cases, the complex tanh activation function was employed. Note that 
the WL-NCLMS has been recently used as an alternative to the CLMS, in an attempt to cope with 



q(n) = t(n) + (0.2 + 0.25?) • t 2 {n) + (0.12 + 0.09i) • t\n). 




(31) 



((r(n + D),r(n + D - 1), . . . ,r(n + D - L + l)),s(n)) 
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non circularity as well as with soft nonlinearities. In all algorithms, the step update parameter, /i, is 
tuned for the best possible results (in terms of the steady-state error rate). For the case of the MLP, 
the design was also tuned so that the best possible results were obtained. Time delay D was also set 
for optimality. Figure |2j shows the learning curves of the NCKLMS1 using the real Gaussian kernel 
K,(x,y) = exp(— \\x — y\\ 2 /a 2 ) (with a = 5) and the NCKLMS2 using the complex Gaussian kernel 
K a,c d ( z i w ) '■= exp ^— ^ i=1 ^2 Wi - ~\ (with a = 5), together with those obtained from the NCLMS 
and the WL-NCLMS algorithms. Figure [3] shows the learning curves of the NCKLMS1 and NCKLMS2 
versus the CNGD and the L-50-1 MLP. Finally, figure |4] compares the learning curves of NCKLMS1 
versus a split channel approach, that treats the complex signal as two real ones using the KLMS. 

The novelty criterion was used for the sparsification of the NCKLMS1 with Si = 0.15 and 82 = 0.2 and 
of the NCKLMS2 with 5\ = 0.1 and 5 2 = 0.2. In both examples, NCKLMS1 considerably outperforms 
the linear, widely linear (i.e., NCLMS and WL-NCLMS) and nonlinear (CNGD and MLP) algorithms 
(see figures |2j [3]). The NCKLMS2 also exhibits improved performance compared to the linear, widely 
linear and nonlinear algorithms. However, in both cases, this enhanced behavior comes at a price in 
computational complexity, since the NCKLMS requires the evaluation of the kernel function. In terms 
of the required computer time, the complexity of CKLMS1 and CKLMS2 is of the same order as the 
complexity of the MLP. Comparing the NCKLMS 1 and the NCKLMS2, the experiments show that the 
results differ, with the former one leading to an improved performance. Finally, figure [4] illustrates that 
the split channel approach performs poorly compared to the NCKLMS 1, especially in the circular case, 
as it cannot capture the correlation between the two real channels. 

B. Channel Identification 

The nonlinear channel that was considered (see |[T8l ) consists of a linear filter: 

5 

t(n) = ^h(k) ■ s(n-k + 1), 
k=i 

where 

h(k) = 0.432 (l + cos (Hzfc^T) _ ( : + cos 

for k — 1, . . . , 5, and the nonlinear component: 

x(n) = t(n) + (0.15 - 0.1i)t 2 (n). 

Similar to the equalization case, the input signal that was fed to the channel had the form (13 It . Experiments were 
conducted on a set of 10000 samples of the input signal (13 It . corrupted by white gaussian noise, considering both 
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Fig. 2. Learning curves for KCLMS1 (p = 1/2), KCLMS2, Qi = 1/4), CLMS Qj = 1/16) and WL-CLMS (p = 1/16) 
(filter length L = 5, delay D = 2) for the soft nonlinear channel equalization problem, for (a) the circular input case, (b) the 
non-circular input case (p — 0.1). 
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Fig. 3. Learning curves for KCLMS1 (p = 1/2), KCLMS2, (p = 1/4), CNGD and L-50-1 MLP (filter length L = 5, delay 
D — 2) for the hard nonlinear channel equalization problem, for (a) the circular input case, (b) the non-circular input case 
(p = 0.1). 



the circular and the non-circular case. The level of the noise was set to 18dB. Figure [5] shows the learning curves 
of the NCKLMS1 and the NCKLMS2 together with those obtained from the CNGD and the £-50-1 MLP. In 
this example, also, NCKLMS1 considerably outperforms both the CNGD and the L-50-1 MLP. The NCKLMS2 
although performs better than MLP, CNGD, its performance is inferior to NCKLMS 1 . 
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Kernel NCLMS versus Dual Channel Real KNLMS 



Kernel NCLMS versus Dual Channel Real KNLMS 
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Fig. 4. Learning curves for KCLMS1 (/i = 1/2) and Dual Channel Real KLMS (/i = 1/2) for the soft nonlinear channel 
equalization problem, for (a) the circular input case, (b) the non-circular input case (p = 0.1). 
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Fig. 5. Learning curves for KCLMS1 (fx = 1/2), KCLMS2, O = 1/4), CNGD and £-50-1 MLP (filter length L = 5) for the 
nonlinear channel identification problem, for (a) the circular input case, (b) the non-circular input case (p = 0.1). 



IX. Conclusions 

A new framework for kernel adaptive filtering for complex signal processing has been developed. The proposed 
methodology, besides providing a skeleton for working with pure complex kernels, allows for the construction of 
complex RKHSs from real ones, through a technique called complexification of RKHSs. Such an approach provides 
the advantage of working with popular and well understood real kernels in the complex domain. It has to be pointed 
out, that our method is a general one and can be used on any type of real and/or complex kernels that have or can be 
developed. To the best of our knowledge, this is the first time that a methodology for complex adaptive processing 
in RKHSs is proposed. Wirtinger's calculus has been extended to cope with the problem of differentiation in the 
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involved (infinite) dimensional Hilbert spaces. The derived rules and properties of the extended Wirtinger's calculus 
on complex RKHS turn out to be similar in structure to the special case of finite dimensional complex spaces. The 
proposed framework was applied on the complex LMS and two realizations for the complex Kernel LMS algorithm 
were developed. Experiments, which were performed on both the equalization and the identification problem of a 
nonlinear channel, for both circular and non-circular input data, showed a significant decrease in the steady state 
mean square error, compared with other known linear, widely linear and nonlinear techniques, while retaining a 
fast convergence. 

Appendix A 
Proof of Proposition IVI.2I 

We start with a lemma that will be used to prove the claim. 

Lemma A.l. Consider the Hilbert space H and a, b € H. The limit 

(h* , a)n — (h, b)m 
lim 2 — ' 7 " \ ! /M = o, (32) 

||/i|| H -+o ||n||E 

and only if a = b = 0. 

Consider the first order Taylor expansions of T r and Tj at c = ci + ic2 = (ci, C2): 

T r (c + h) = T r {c) + (hi, V u T r (c)) n + (h 2 , V v T r (c)) n + o(||fc||*0i 
Ti(c + h) = T % {c) + (h 1 ,V u T i {c)) H + (h 2 , S7 v T l {c)) H + o(\\h\\ n *). 
Multiplying the second relation with i and adding it to the first one, we take: 

T(c + h)= T(c) + (hi,V u T r (c) - iV u Ti(c)) m + {h 2 ,V v T r (c) - iV„T i (c)) H + o(||fc|| H ). 
To simplify the notation we may define 

V u T(c) = V„T r (c) + iVuT^c) V v T{c) = V v T r {c) + iV^c) 

and obtain: 

T(c + h) = T(c) + (h u (V u T(c))*) H + (h 2 , (V fl r(c))*) H + o{\\h\\ H 2). 

Next, we substitute hi and h% using the relations h\ = ^ and h% = h ~ 2 ^ and use the sesquilinear property of 
the inner product of H: 

T(c + h) = T(c) + 1 (h, (V„T(c) - ^V„^(c))*) H + I (h*, (V»T(c) + iV v T(c))% + o(\\h\\ M ). 
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It has already been shown that equation (|2TT i is essential for the development of Wirtinger's calculus. To complete 
the proof of the proposition we compute the fraction that appears in the definition of the complex Frechet derivative: 
T(c + h)-T(c) - (h,w) n = 

Q (h, (V„T(c) - iV„T(c))*> H + ~ <fc* , (V„T(c) + iV,T(c))*) B - (h, w) u ^j / \\h\\ m + 

Recall that, since o(||fa||n)/||fo||ii -> as ||^||h — > 0, for this limit to exist and vanish, it is necessary that 
V u T(c) + iV„T(c) = and w* = V„T(c) — iV v T(c) (see lemma [A7TT >. However, according to our definition, 

V u T(c) + iV v T{c) = (VJ r (c) - V„Ti(c)) + i (VJ^c) + V„T r (c)) . 

Thus, T is differentiable in the Frechet complex sense, iff the Cauchy-Riemann conditions hold. Moreover, in this 
case: 

VT(c) =V„T r (c) + i V M T l (c) = V„Ti(c) - iV„T r (c). 
Appendix B 

Properties of Wirtinger's Derivatives on complex Hilbert spaces 

Below we give a complete list of the main properties of the extended Wirtinger's Calculus in complex Hilbert 
spaces. A rigorous and detailed presentation of the theory, as well as the proofs of all these properties can be found 
in ||35l 

1) If T(f) is /-holomorphic at c (i.e., it has a Taylor series expansion with respect to / around c), then its 
Frechet W-derivative at c degenerates to the standard Frechet complex derivative and its Frechet CW-derivative 
vanishes, i.e., V/«T(c) = 0. 

2) If T(f) is /*-holomorphic at c (i.e., it has a Taylor series expansion with respect to /* around c), then 
V/T(c) = 0. 

3) (V / T(c))* = V / .T*(c). 

4) (V r T(c))* = V/T*(c). 

5) If T is real valued, then (V/T(c))* = V r T(c). 

6) The first order Taylor expansion around / 6 H is given by 

T(f + h) =T(f) + (h, (V/T(/))*) H + <h*, (V/.T(/)) V 

7) If T(f) = (/, u;) H , then V/T(c) = to*, V/*T(c) = 0, for every c. 

8) If T(/) = («;, /) H , then V/T(c) = 0, V r T(c) = to, for every c. 

9) If T(/) = (/*,w)h, then V/T(c) = 0, V/-T(c) = to*, for every c. 
10) If T(/) = («;, /*) H , then V/T(c) = u>, V/*T(c) = 0, for every c. 
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12) Product Rule: If T, S 



1 1) Linearity: If T, S : H — > C are Frechet differentiable in the real sense at c 6 H and a, j3 € C, then 

V/(aT + 05) (c) = aV/T(c) + 0V/S(c) 
V / .(ar + /3S , )(c) = aV r T(c)+ ) SV r S(c). 
-> C are Frechet differentiable in the real sense at c 6 H, then: 
V/(T • S)(c) = V f T(c)S(c) + T(c)V f S(c), 
V/-(T • S)(c) = V/.T(c)S(c) + T(c)V/.S(c). 
13) Division Rule: If T, £ : H — > C are Frechet differentiable in the real sense at c e H and S'(c) 7^ 0, then: 

V/ UJ (c) = ' 



14) Chain Rule: If T : 
at T(c) € C, then: 



/T\ V / .T( C )5( C )-T( C )V / .S( C ) 

r uj (c) = ^ ■ 

is Frechet differentiable at c G H, S 1 : C — > C is differentiable in the real sense 



V f SoT(c) = —(T(c))V f T(c) + — (T(c))V f (T*)(c), 

V f SoT(c) = -(T(c))V f T(c) + -(T(c))V f (r)(c). 

The proofs of properties 1 and 2 are rather obvious. Here, we give the proofs of properties 3, 7 and 11, which 
have been used to derive the main results of this paper. 

Proof of property 3: The existence of V/T(c) and V/»T(c) is guaranteed by the Frechet differentiability of 
T at c (in the real sense). To take the result, observe that: 

(V/T(c))* =i (V u T r (c) + V„Ti(c)) - i (V u T,(c) - V„T P (c)) 

=i (V u T r (c) - V„(-T«)(c)) + I (V w (-2})(c) + V*T r (c)) 

= (V r T*(c)). 



Property 4 can be proved similarly. 
Proof of property 7: Considering the definition of Frechet complex derivative (see equation ITTb . we observe 

that: 

T(c + h) -T(c)-(h,g)m = (c + h,w)m- (c,w)m- (h,g)m = (h,w)w- (h,g)w- 

Thus, T is Frechet complex differentiable at c, with VT(c) = w* and from property 1, V/* (c) = and V/(c) = 
w. ■ 
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Proof of property 11: Let T(f) =T r (u f , Vf)+iTi(uf, Vf), S(f) = S(u f + iv f ) = S r (uf,Vf) + iSi(uf,Vf) 
be two complex functions and a, (3 e C, such that a = a\ + ia 2 , (3 = Pi + Then R(f) = aT(f) + f3S(f) 
and the Frechet W-derivative of R will be given by: 

V f R(c) =i (V„i? r (c) + V„i2j(c)) + l - {V u Ri(c) - V v R r (c)) . 

Applying the linearity property of the ordinary Frechet derivative, after some algebra we take the result. For the 
second part, in view of properties 3, 4 and the linearity property of the Frechet W-derivative, the Frechet CW- 
derivative of JR at c will be given by: 

V/.fl(c) =V r (aT + /3S)(c) = (V/(oT + /3S)*(c))* 

= (V/(a*T* + /3*S*)(c))* = (a*V/T*(c) + /3*V/5*(c))* 

=a (V/T*(c))* + /3 (V/S*(c))* - aV/.T(c) + /3V/. 5(c), 

which completes the proof. ■ 

References 

[1] W. Liu, P. Pokharel, and J. C. Principe, "The kernel least-mean-square algorithm," IEEE Trans. Sign. Proc, vol. 56, no. 2, 
pp. 543-554, 2008. 

[2] J. Kivinen, A. Smola, and R. C. Williamson, "Online learning with kernels," IEEE Trans. Sign. Proc, vol. 52, no. 8, pp. 
2165-2176, 2004. 

[3] Y. Engel, S. Mannor, and R. Meir, "The kernel recursive least-squares algorithm," IEEE Trans. Sign. Proc, vol. 52, no. 8, 
2004. 

[4] K. Slavakis, S. Theodoridis, and I. Yamada, "On line classification using kernels and projection based adaptive algorithm," 

IEEE Trans. Signal Process., vol. 56, no. 7, pp. 2781-2797, 2008. 
[5] , "Adaptive constrained learning in reproducing kernel hilbert spaces: The robust beamforming case," IEEE Trans. 

Signal Process., vol. 57, no. 12, pp. 4744-^1764, 2009. 
[6] K. Slavakis and S. Theodoridis, "Sliding window generalized kernel affine projection algorithm using projection mappings," 

Eurasip Journal on Advances in Signal Processing, vol. art. no. 735351, 2008. 
[7] B. Scholkopf and A. J. Smola, Learning with Kernels. MIT Press, 2002. 
[8] S. Theodoridis and K. Koutroumbas, Pattern Recognition, 4th edition. Academic Press, 2009. 
[9] J. Shawe-Taylor and N. Cristianini, Kernel Methods for Pattern Analysis. Cambridge University Press, 2004. 
[10] K. Kim, M. O. Franz, and B. Scholkopf, "Iterative kernel principal component analysis for image modeling," IEEE Trans. 

Pattern Anal. Mach. Intell., vol. 27, no. 9, pp. 1351-1366, 2005. 
[11] P. Bouboulis, K. Slavakis, and S. Theodoridis, "Adaptive kernel-based image denoising employing semi-parametric 

regularization," IEEE Trans. Image Process., vol. 19, no. 6, pp. 1465 - 1479, 2010. 
[12] B. Scholkopf, A. J. Smola, and K. R. Muller, "Kernel principal component analysis," Lecture notes in computer science, 

vol. 1327, pp. 583-588, 1997. 

[13] M. Filippone, F. Camastra, F. Masulli, and S. Rovetta, "A survey of kernel and spectral methods for clustering, pattern 
recognition," Pattern Recognition, vol. 41, no. 1, pp. 176-190, 2008. 



IEEE TRANSACTIONS ON SIGNAL PROCESSING 



31 



[14] W. Liu, J. C. Principe, and S. Haykin, Kernel Adaptive Filtering. Wiley, 2010. 

[15] W. Wirtinger, "Zur formalen theorie der functionen von mehr complexen veranderlichen," Math. Ann., vol. 97, pp. 357-375, 
1927. 

[16] B. Picinbono and P. Chevalier, "Widely linear estimation with complex data," IEEE Trans. Signal Process., vol. 43, no. 8, 
pp. 2030-2033, 1995. 

[17] D. Mandic and V. S. L. Goh, Complex Valued Nonlinear Adaptive Filters. Wiley, 2009. 

[18] T. Adali and H. Li, Complex-valued adaptive signal processing, ser. Adaptive Signal Processing: Next Generation Solutions, 

T. Adali and S. Haykin, editors. Hoboken, NJ, Wiley, 2010. 
[19] T. Adali, H. Li, M. Novey, and J. Cardoso, "Complex ICA using nonlinear functions," IEEE Trans. Signal Process., vol. 56, 

no. 9, pp. 4536^544, 2008. 

[20] M. Novey and T. Adali, "On extending the complex fast ICA algorithm to noncircular sources," IEEE Trans. Signal 

Process., vol. 56, no. 5, pp. 2148-2154, 2008. 
[21] H. Li and T. Adali, "Complex-valued adaptive signal processing using nonlinear functions," Journal on Advances in Signal 

Processing, Special issue on Emerging Machine Learning Techniques in Signal Processing, pp. Article ID 765 615, 9 pages, 

2008. 

[22] H. L. T. Adali and R. Aloysius, "On properties of the widely linear mse filter and its 1ms implementation," Proc. Conf. 

on Info. Sciences and Systems ( CISS), Baltimore, MD, March 2009. 
[23] T. Adali and S. Haykin, Adaptive Signal Processing: Next Generation Solutions. Wiley-IEEE Press, 2010. 
[24] S. Saitoh, Integral Transforms, Reproducing Kernels and their applications. Longman Scientific & Technical, Harlow, 

1997. 

[25] V. I. Paulsen, "An introduction to the theory of reproducing kernel hilbert spaces," http://www.math.uh.edu/~vern/rkhs.pdf 
[26] J. Mercer, "Functions of positive and negative type and their connection with the theory of integral equations," Phil. Trans. 

Roy. Soc. Ser. A, vol. 209, pp. 415^46, 1909. 
[27] N. Aronszajn, "Theory of reproducing kernels," Transactions of the American Mathematical Society, vol. 68, no. 3, pp. 

337^04, 1950. 

[28] I. Steinwart, D. Hush, and C. Scovel, "An explicit description of the reproducing kernel hilbert spaces of gaussian rbf 

kernels," IEEE Transactions on Information Theory, vol. 52, no. 10, pp. 4635^4-643, 2006. 
[29] D. Luenberger, Linear and Nonlinear Programming. Addison- Wesley, 1984. 
[30] Liusternik and Sobolev, Elements of Functional Analysis. Frederick Ungar Publishing Co, 1961. 
[31] A. V. Balakrishnan, Applied Functional Analysis. Springer, 1976. 

[32] D. H. Brandwood, "A complex gradient operator and its application in adaptive array theory," 1EE proc. H (Microwaves, 

optics and Antennas), vol. 130, no. 1, pp. 11-16, 1983. 
[33] A. V. de Bos, "Complex gradient and hessian," 1EE proc. Visual image signal processing, vol. 141, no. 6, pp. 380-382, 

1994. 

[34] K. Kreutz-Delgado, "The complex gradient operator and the CR-calculus," http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.86.6515&n 
[35] P. Bouboulis, "Wirtinger's calculus in general hilbert spaces," http://arxiv.org/abs/1005.5170 
[36] S. B. Chae, Holomorphy and Calculus in Normed Spaces. Marcel Dekker, 1985. 

[37] J. Piatt, "A resourse allocating network for function interpolation," Newral Computation, vol. 3, no. 2, pp. 213-225, 1991. 
[38] S. van Vaerenbergh, J. Via, and I. Santamana, "A sliding-window kernel RLS algorithm and its application to nonlinear 
channel identification," in Proceedings of ICASSP, vol. V. Toulouse, France: IEEE, 2006, pp. 789-792. 



IEEE TRANSACTIONS ON SIGNAL PROCESSING 

[39] B. Picinbono, "On circularity," IEEE Trans. Signal Process., vol. 42, no. 12, pp. 3473 - 3482, 1994. 



32 



PLACE 
PHOTO 
HERE 



Pantelis Bouboulis (M' 10) received the M.Sc. and Ph.D. degrees in informatics and telecommunications 
from the National and Kapodistrian University of Athens, Greece, in 2002 and 2006, respectively. From 
2007 till 2008, he served as an Assistant Professor in the Department of Informatics and Telecommuni- 
cations, University of Athens. His current research interests lie in the areas of machine learning, fractals, 
wavelets and image processing. He is a member of AMS and IEEE. 



Sergios Theodoridis (F08) is currently Professor of signal processing and communications in the Depart- 
ment of Informatics and Telecommunications, University of Athens, Athens, Greece. His research interests 
lie in the areas of adaptive algorithms and communications, machine learning and pattern recognition, and 
signal processing for audio processing and retrieval. He is the co-editor of the book Efficient Algorithms 
for Signal Processing and System Identification (Prentice-Hall, 1993), coauthor of the best selling book 
Pattern Recognition (Academic, 4th ed., 2008), and the coauthor of three books in Greek, two of them for 
the Greek Open University. He is currently an Associate Editor of the IEEE TRANSACTIONS ON NEURAL NETWORKS, the 
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II, and a member of the editorial board of the EURASIP Wireless 
Communications and Networking. He has served in the past as an Associate Editor of the IEEE TRANSACTIONS ON SIGNAL 
PROCESSING, the IEEE Signal Processing Magazine, the EURASIP Journal on Signal Processing, and the EURASIP Journal 
on Advances on Signal Processing. He was the general chairman of EUSIPCO 1998, the Technical Program cochair for ISCAS 
2006, and cochairman of ICIP 2008. He has served as President of the European Association for Signal Processing (EURASIP) 
and he is currently a member of the Board of Governors for the IEEE CAS Society. He is the coauthor of four papers that 
have received best paper awards including the 2009 IEEE Computational Intelligence Society Transactions on Neural Networks 
Outstanding paper Award. He serves as an IEEE Signal Processing Society Distinguished Lecturer. He is a member of the 
Greek National Council for Research and Technology and Chairman of the SP advisory committee for the Edinburgh Research 
Partnership (ERP). He has served as vice chairman of the Greek Pedagogical Institute and for four years, he was a member of 
the Board of Directors of COSMOTE (the Greek mobile phone operating company). He is Fellow of IET and a Corresponding 
Fellow of FRSE. 



PLACE 
PHOTO 
HERE 



